How to Make a Frenemy: Multitape FSTs for Portmanteau Generation

نویسندگان

  • Aliya Deri
  • Kevin Knight
چکیده

A portmanteau is a type of compound word that fuses the sounds and meanings of two component words; for example, “frenemy” (friend + enemy) or “smog” (smoke + fog). We develop a system, including a novel multitape FST, that takes an input of two words and outputs possible portmanteaux. Our system is trained on a list of known portmanteaux and their component words, and achieves 45% exact matches in cross-validated experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rectilinear Full Steiner Tree Generation Rectilinear Full Steiner Tree Generation

The fastest exact algorithm (in practice) for the rectilinear Steiner tree problem in the plane uses a two-phase scheme: First a small but suucient set of full Steiner trees (FSTs) is generated and then a Steiner minimum tree is constructed from this set by using simple backtrack search, dynamic programming or an integer programming formulation. FST generation methods can be seen as problem red...

متن کامل

Charmanteau: Character Embedding Models For Portmanteau Creation

Portmanteaus are a word formation phenomenon where two words are combined to form a new word. We propose character-level neural sequence-tosequence (S2S) methods for the task of portmanteau generation that are end-toend-trainable, language independent, and do not explicitly use additional phonetic information. We propose a noisy-channelstyle model, which allows for the incorporation of unsuperv...

متن کامل

EM training of finite-state transducers and its application to pronunciation modeling

Recently, finite-state transducers (FSTs) have been shown to be useful for a number of applications in speech and language processing. FST operations such as composition, determinization, and minimization make manipulating FSTs very simple. In this paper, we present a method to learn weights for arbitrary FSTs using the EM algorithm. We show that this FST EM algorithm is able to learn pronuncia...

متن کامل

TCR-clonotyping-based Analysis of a Frenemy —Public T Cells

Public T cells are characterized by a T-cell population bearing identical T-cell receptor (TCR) amino acid sequences in a majority of individuals. However, due to their out-of-balance production, they may serve as both "friend" and "enemy", i.e. a frenemy in many physiological and pathological activities. Thanks to the development of advanced TCR-clonotyping methods, such as next generation seq...

متن کامل

Em Training of Finite-sta and Its Application to Pronu

Recently, finite-state transducers (FSTs) have been shown to be useful for a number of applications in speech and language processing. FST operations such as composition, determinization, and minimization make manipulating FSTs very simple. In this paper, we present a method to learn weights for arbitrary FSTs using the EM algorithm. We show that this FST EM algorithm is able to learn pronuncia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015